274 research outputs found

    Regression tree models for designed experiments

    Full text link
    Although regression trees were originally designed for large datasets, they can profitably be used on small datasets as well, including those from replicated or unreplicated complete factorial experiments. We show that in the latter situations, regression tree models can provide simpler and more intuitive interpretations of interaction effects as differences between conditional main effects. We present simulation results to verify that the models can yield lower prediction mean squared errors than the traditional techniques. The tree models span a wide range of sophistication, from piecewise constant to piecewise simple and multiple linear, and from least squares to Poisson and logistic regression.Comment: Published at http://dx.doi.org/10.1214/074921706000000464 in the IMS Lecture Notes--Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    PLUTO: Penalized Unbiased Logistic Regression Trees

    Full text link
    We propose a new algorithm called PLUTO for building logistic regression trees to binary response data. PLUTO can capture the nonlinear and interaction patterns in messy data by recursively partitioning the sample space. It fits a simple or a multiple linear logistic regression model in each partition. PLUTO employs the cyclical coordinate descent method for estimation of multiple linear logistic regression models with elastic net penalties, which allows it to deal with high-dimensional data efficiently. The tree structure comprises a graphical description of the data. Together with the logistic regression models, it provides an accurate classifier as well as a piecewise smooth estimate of the probability of "success". PLUTO controls selection bias by: (1) separating split variable selection from split point selection; (2) applying an adjusted chi-squared test to find the split variable instead of exhaustive search. A bootstrap calibration technique is employed to further correct selection bias. Comparison on real datasets shows that on average, the multiple linear PLUTO models predict more accurately than other algorithms.Comment: 59 pages, 25 figures, 14 table

    A Machine-Learning Classification Tree Model of Perceived Organizational Performance in U.S. Federal Government Health Agencies

    Get PDF
    Perceived organizational performance (POP) is an important factor that influences employees’ attitudes and behaviors such as retention and turnover, which in turn improve or impede organizational sustainability. The current study aims to identify interaction patterns of risk factors that differentiate public health and human services employees who perceived their agency performance as low. The 2018 Federal Employee Viewpoint Survey (FEVS), a nationally representative sample of U.S. federal government employees, was used for this study. The study included 43,029 federal employees (weighted n = 75,706) among 10 sub-agencies in the public health and human services sector. The machine-learning classification decision-tree modeling identified several tree-splitting variables and classified 33 subgroups of employees with 2 high-risk, 6 moderate-risk and 25 low-risk subgroups of POP. The important variables predicting POP included performance-oriented culture, organizational satisfaction, organizational procedural justice, task-oriented leadership, work security and safety, and employees’ commitment to their agency, and important variables interacted with one another in predicting risks of POP. Complex interaction patterns in high- and moderate-risk subgroups, the importance of a machine-learning approach to sustainable human resource management in industry 4.0, and the limitations and future research are discussed

    Identification of subgroups with differential treatment effects for longitudinal and multiresponse variables

    Get PDF
    We describe and evaluate a regression tree algorithm for finding subgroups with differential treatments effects in randomized trials with multivariate outcomes. The data may contain missing values in the outcomes and covariates, and the treatment variable is not limited to two levels. Simulation results show that the regression tree models have unbiased variable selection and the estimates of subgroup treatment effects are approximately unbiased. A bootstrap calibration technique is proposed for constructing confidence intervals for the treatment effects. The method is illustrated with data from a longitudinal study comparing two diabetes drugs and a mammography screening trial comparing two treatments and a control
    • …
    corecore